Conditional Random Fields For Local Adaptive Reference Extraction

نویسندگان

  • Martin Toepfer
  • Peter Klügl
  • Andreas Hotho
  • Frank Puppe
چکیده

The accurate extraction of bibliographic information from scientific publications is an active field of research. Machine learning, especially sequence labeling approaches like Conditional Random Fields (CRF), are often applied for this reference extraction task, but still suffer from the ambiguity of reference notation. Reference sections apply a predefined style guide and contain only homogeneous references. Therefore, other references of the same paper or journal often can provide evidence how the fields of a reference are correctly labeled. We propose a novel approach that exploits the similarities within a document. Our process model uses information of unlabeled documents directly during the extraction task in order to automatically adapt to the perceived style guide. This is implemented by changing the manifestation of the features for the applied CRF. The experimental results show considerable improvements compared to the common approach. We achieve an average F1 score of 96.7% and an instance accuracy of 85.4% on the test data set.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reference String Extraction Using Line-Based Conditional Random Fields

The extraction of individual reference strings from the reference section of scientific publications is an important step in the citation extraction pipeline. Current approaches divide this task into two steps by first detecting the reference section areas and then grouping the text lines in such areas into reference strings. We propose a classification model that considers every line in a publ...

متن کامل

Heterogeneous Web Data Extraction Algorithm Based On Modified Hidden Conditional Random Fields

As it is of great importance to extract useful information from heterogeneous Web data, in this paper, we propose a novel heterogeneous Web data extraction algorithm using a modified hidden conditional random fields model. Considering the traditional linear chain based conditional random fields can not effectively solve the problem of complex and heterogeneous Web data extraction, we modify the...

متن کامل

Relationship Extraction from Biomedical Documents using Conditional Random Fields

Extracting complex relationships automatically from unstructured information resources is a challenging problem. It is an important problem in this present age of abundant machine processable information as there is a need to build intelligent knowledge-aware applications for tasks such search, extraction and reasoning. We have used Conditional Random Fields (CRFs) to identify various relations...

متن کامل

Techniques d'apprentissage supervisé pour l'extraction d'événements TimeML en anglais et français

Identifying events from texts is an information extraction task necessary for many NLP applications. Through the TimeML specifications and TempEval challenges, it has received some attention in the last years, yet, no reference result is available for French. In this paper, we try to fill this gap by proposing several event extraction systems, combining for instance Conditional Random Fields, l...

متن کامل

Conditional Random Fields for XML Trees

We present xml Conditional Random Fields (xcrfs), a framework for building conditional models to label xml data. xcrfs are Conditional Random Fields over unranked trees (where every node has an unbounded number of children). The maximal cliques of the graph are triangles consisting of a node and two adjacent children. We equip xcrfs with efficient dynamic programming algorithms for inference an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010